49 research outputs found

    Non-distributional Word Vector Representations

    Full text link
    Data-driven representation learning for words is a technique of central importance in NLP. While indisputably useful as a source of features in downstream tasks, such vectors tend to consist of uninterpretable components whose relationship to the categories of traditional lexical semantic theories is tenuous at best. We present a method for constructing interpretable word vectors from hand-crafted linguistic resources like WordNet, FrameNet etc. These vectors are binary (i.e, contain only 0 and 1) and are 99.9% sparse. We analyze their performance on state-of-the-art evaluation methods for distributional models of word vectors and find they are competitive to standard distributional approaches.Comment: Proceedings of ACL 201

    Correlation-based Intrinsic Evaluation of Word Vector Representations

    Full text link
    We introduce QVEC-CCA--an intrinsic evaluation metric for word vector representations based on correlations of learned vectors with features extracted from linguistic resources. We show that QVEC-CCA scores are an effective proxy for a range of extrinsic semantic and syntactic tasks. We also show that the proposed evaluation obtains higher and more consistent correlations with downstream tasks, compared to existing approaches to intrinsic evaluation of word vectors that are based on word similarity.Comment: RepEval 2016, 5 page

    Automatic correction of disfluent spoken queries

    Get PDF
    A user’s interaction with a virtual assistant typically involves spoken requests, queries, and commands which often includes disfluencies. This disclosure describes techniques to automatically correct disfluent queries. Per techniques of this disclosure, a disfluency correction machine learning model is utilized to convert a disfluent query to a corresponding fluent query. Lexical features extracted from the disfluent query are utilized to determine a portion of the query that is removed from the disfluent query to convert it to a fluent query. The model is trained using pairs of queries

    Contextual Error Correction in Automatic Speech Recognition

    Get PDF
    This disclosure describes techniques that leverage the context of a conversation between a user and a virtual assistant to correct errors in automatic speech recognition (ASR). Once confirmed by the user, the correction event is used to augment the training data for ASR

    Learning Word Representations with Hierarchical Sparse Coding

    Full text link
    We propose a new method for learning word representations using hierarchical regularization in sparse coding inspired by the linguistic study of word meanings. We show an efficient learning algorithm based on stochastic proximal methods that is significantly faster than previous approaches, making it possible to perform hierarchical sparse coding on a corpus of billions of word tokens. Experiments on various benchmark tasks---word similarity ranking, analogies, sentence completion, and sentiment analysis---demonstrate that the method outperforms or is competitive with state-of-the-art methods. Our word representations are available at \url{http://www.ark.cs.cmu.edu/dyogatam/wordvecs/}
    corecore